Back

Communications Psychology

Springer Science and Business Media LLC

Preprints posted in the last 7 days, ranked by how well they match Communications Psychology's content profile, based on 20 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Ad-verse Effects: Pharmaceutical Advertising Shifts Drug Recommendations by Consumer-Facing AI

Omar, M.; Agbareia, R.; McGreevy, J.; Zebrowski, A.; Ramaswamy, A.; Gorin, M.; Anato, E. M.; Glicksberg, B. S.; Sakhuja, A.; Charney, A.; Klang, E.; Nadkarni, G.

2026-04-16 health policy 10.64898/2026.04.14.26350868 medRxiv
Top 0.4%
0.6%
Show abstract

Large language models are increasingly used for clinical guidance while their parent companies introduce advertising. We tested whether pharmaceutical ads embedded in the prompts of 12 models from OpenAI, Anthropic, and Google shift drug recommendations across 258,660 API calls and four experiments probing distinct epistemic conditions. When two drugs were both guideline appropriate, advertising shifted selection of the advertised drug by +12.7 percentage points (P < 0.001), with some model scenario pairs shifting from 0% to 100%. Google models were the most susceptible (+29.8 pp), followed by OpenAI (+10.9 pp), while Anthropic models showed minimal change (+2.0 pp). When the advertised product lacked evidence or was clinically suboptimal, models resisted. This reveals a structured vulnerability: advertising does not override medical knowledge but fills the space where clinical evidence is underdetermined. An open response sub analysis (2,340 calls across three representative models) confirmed that advertising restructures free-text clinical reasoning: models echoed ad claims at 2.7 times the baseline rate while maintaining high stated confidence and rarely disclosing the ad. Susceptibility was provider dependent (Google: +29.8 pp; OpenAI: +10.9 pp; Anthropic: +2.0 pp). Because this bias operates within clinically correct answers, it is invisible to accuracy based evaluation, identifying a class of AI safety vulnerability that standard testing cannot detect.

2
An independent supervisory safety agent improves reaction of large language models to suicidal ideation

Trivedi, S.; Simons, N. W.; Tyagi, A.; Ramaswamy, A.; Nadkarni, G. N.; Charney, A. W.

2026-04-15 psychiatry and clinical psychology 10.64898/2026.04.13.26350757 medRxiv
Top 0.7%
0.3%
Show abstract

Background: Large language models (LLMs) are increasingly used in mental health contexts, yet their detection of suicidal ideation is inconsistent, raising patient safety concerns. Objective: To evaluate whether an independent safety monitoring system improves detection of suicide risk compared with native LLM safeguards. Methods: We conducted a cross-sectional evaluation using 224 paired suicide-related clinical vignettes presented in a single-turn format under two conditions (with and without structured clinical information). Native LLM safeguard responses were compared with an independent supervisory safety architecture with asynchronous monitoring. The primary outcome was detection of suicide risk requiring intervention. Results: The supervisory system detected suicide risk in 205 of 224 evaluations (91.5%) versus 41 of 224 (18.3%) for native LLM safeguards. Among 168 discordant evaluations, 166 favored the supervisory system and 2 favored the LLM (matched odds ratio {approx}83.0). Both systems detected risk in 39 evaluations, and neither in 17. Detection was highest in scenarios with explicit suicidal ideation and lower in more ambiguous presentations. Conclusions: Native LLM safeguards frequently failed to detect suicide risk in this structured evaluation. An independent monitoring approach substantially improved detection, supporting the role of external safety systems in high-risk mental health applications of LLMs.

3
Neural Sensitivity to Word Frequency Modulated by Morphological Structure: Univariate and Multivariate fMRI Evidence from Korean

Kim, J.; Lee, S.; Nam, K.

2026-04-16 neuroscience 10.1101/2025.11.20.689262 medRxiv
Top 1%
0.2%
Show abstract

A central question in psycholinguistics in visual word recognition is whether morphologically complex words are obligatorily decomposed into stems and affixes during visual word recognition or whether whole-word access can occur when forms are frequent and familiar. The present study investigated how morphological complexity and lexical frequency jointly shape neural responses by leveraging Korean nominal inflection, whose transparent stem-suffix structure permits a clean dissociation between base (stem) frequency and surface (whole-word) frequency. Twenty-five native Korean speakers completed a rapid event-related fMRI lexical decision task involving simple and inflected nouns that varied parametrically in both frequency measures. Representational similarity analysis (RSA) revealed robust encoding of surface frequency--but not base frequency--in the inferior frontal gyrus (IFG) pars opercularis and supramarginal gyrus (SMG), with significantly stronger correlations for inflected than simple nouns. Univariate analyses converged with this result: surface frequency selectively increased activation for inflected nouns in inferior parietal regions, whereas base frequency showed no reliable effects in any ROI. These findings challenge models positing obligatory pre-lexical decomposition, instead supporting accounts in which morphological processing is shaped by post-lexical, usage-driven lexical statistics. Taken together, our findings shed light on a distributed perspective on morphological processing, suggesting that structural and statistical factors jointly constrain access to morphologically complex forms.

4
Social mobility and long-term episodic memory in Britain

Tampubolon, G.

2026-04-13 epidemiology 10.64898/2026.04.12.26350709 medRxiv
Top 2%
0.1%
Show abstract

Population ageing increases the importance of cognitive capacity for making decisions about retirement and living independently beyond it. We tested whether post-war educational expansion and working-life social mobility eliminate the association between social class of origin and cognition in early old age using the 1958 National Child Development Study. Two outcomes were analysed at age 62: standard episodic memory (immediate + delayed word recall) and long-term episodic memory, capturing accurate half-century recall of childhood household facts (rooms and people at age 11 validated against mothers' responses). Social mobility trajectories derived in prior work were classified into predominantly manual versus non-manual class trajectories. Models were estimated separately for women and men across three specifications: (i) social origin and controls, (ii) adding social mobility, and (iii) adding weighting to address healthy survivor bias. Education was consistently associated with both outcomes. For long-term episodic memory, social origin gradients were clearer than for short-term episodic memory, with men from service/professional origins showing a 13 percentage-point higher probability of accurate half-century recall than men from manual origins. These findings indicate that education expansion and working-life social mobility failed to release the grip of social origin on long-term episodic memory.

5
Patient-Centred Communication in Lung Cancer Screening: A Clinically Focussed Evaluation of a Fine-Tuned Open-Source Model Against a Larger Frontier System

Khanna, S.; Chaudhary, R.; Narula, N.; Lee, R.

2026-04-11 oncology 10.64898/2026.04.10.26350595 medRxiv
Top 2%
0.1%
Show abstract

Lung cancer screening saves lives, yet uptake remains suboptimal and inequitable. Personalised communication can improve attendance and reduce anxiety, but scaling such support is a workforce challenge. We fine-tuned Googles Gemma 2 9B using QLoRA on 5,086 synthetic screening conversations and compared it against Googles Gemini 2.5 Flash (a larger frontier model) and an unmodified baseline across 300 multi-turn conversations with 100 patient personas spanning ten clinical categories. Evaluation combined automated natural language processing metrics with independent language model judgement in two complementary modes: structured clinical rubric and simulated patient persona. The fine-tuned model achieved the highest simulated patient experience score (3.71/5 vs 3.65 for the frontier model), recorded zero boundary violations after clinician review of all flagged instances, and led on the four most safety-critical categories. A composite Patient Adaptation Index showed that the fine-tuned model led overall (0.37 vs 0.35 vs 0.35), with its clearest advantage on the two clinically specific components: empathy calibration to patient distress and selective smoking cessation signposting. These findings suggest that targeted fine-tuning of open-source models can yield clinical communication quality comparable to larger proprietary systems, with advantages in safety-critical scenarios and suitability for NHS data governance constraints. Human clinician review of these conversations is ongoing.

6
Democratizing Scientific Publishing: A Local, Multi-Agent LLM Framework for Objective Manuscript Editing

Bhansali, R.; Gorenshtein, A.; Westover, B.; Goldenholz, D. M.

2026-04-17 health informatics 10.64898/2026.04.13.26350761 medRxiv
Top 2%
0.1%
Show abstract

Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 agent-suggested rewrite pairs using Phase 0 metrics confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved by 17% . Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process. Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Independent validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 suggested Phase 0 rewrite pairs confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, and long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved modestly. Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process.

7
Public involvement and co-design of longitudinal studies of sleep health alongside young people with rare genetic conditions

Clayton, J. P.; Haddon, J. E.; Hall, J.; Attwood, M.; Jarrold, C.; Berndt, L. C. S.; Saka, A.; van den Bree, M. B. M.; Jones, M. W.; Collaboration: Sleep Detectives Lived Experience Advisory Panel,

2026-04-13 psychiatry and clinical psychology 10.64898/2026.04.07.26348880 medRxiv
Top 2%
0.1%
Show abstract

BackgroundThe mechanisms underpinning associations between sleep and psychiatric conditions are poorly understood, partly due to challenges with longitudinal sleep studies outside the laboratory. Children and young people with rare genetic conditions caused by micro-deletions or -duplications (Copy Number Variants or CNVs) have increased risk of disrupted sleep and poorer neurodevelopmental (ND) outcomes. The Sleep Detectives study aims to investigate this by tracking behavioural and neurophysiological signatures of sleep health in young people with ND risk or ND-CNVs. To optimally achieve this, we have worked with families with ND-CNVs and charity partners to co-design our tools, methods, study protocol, and materials. MethodWe established a Lived Experience Advisory Group (LEAP) with nine parents and 13 children and young people with ND-CNVs, alongside representatives of UK charities Max Appeal and Unique. Together, the research team and LEAP co-designed two in-person family workshops in which we collected feedback on the acceptability of sleep monitoring devices, the design of bespoke cognitive tasks, and overall study protocol. Informal interviews and surveys were conducted with LEAP members and researchers, to enable the team to reflect and learn from their Patient/Public Involvement (PPI) experiences. ResultsKey outputs included pre-workshop invitation and briefing materials and insights that iteratively refined the main study design, including the need for flexibility to increase accessibility, selection of sleep devices, customisation of cognitive tasks, and choice of language in documents. The PPI process was highly valued by LEAP members, workshop attendees, and the research team. One investigator described the PPI work as "reinvigorating my love of research by helping me focus on science that matters". Participating families also established peer support networks. ConclusionsInvolving families affected by ND-CNVs in co-designing the Sleep Detectives study maximised opportunities for acceptability, accessibility and scalability. The research team gained inspiration and deeper understanding of the impact of ND-CNVs on families. Families gained awareness about research, established connections with each other and peer support, and were enthusiastic about future research involvement. This experience empowered families to engage more deeply with the research process and helped the PPI work to be more impactful and inclusive. Plain English summaryChildren and young people with rare genetic conditions caused by small deletion or duplication of genetic material are more likely to experience sleep difficulties such as insomnia, restless sleep, and tiredness. They also show an increased likelihood of neurodevelopmental conditions such as learning disability and autism, and mental health issues such as anxiety. The Sleep Detectives team wanted to explore how these genetic conditions affect childrens sleep, cognition and psychiatric health. To make sure that the project design was well suited to the children and young people that would be invited to participate, the team worked closely with families to design the study. Parents and caregivers of affected children and young people were invited to join a Lived Experience Advisory Panel (LEAP), together with charity representatives and Sleep Detective researchers, to co-design two hands-on workshops, and advise on study design. Children and young people and parents/caregivers attending the workshops tried out and provided feedback on tools and devices that the research team were developing. They also advised on the arrangements and support families might need whilst taking part, and on the study protocol. This collaborative approach helped ensure the study design was optimally suited for the recruitment and participation of children and young people and their families. This report documents our public involvement work for the Sleep Detectives study, illustrating the difference the partnership between researchers and families has made to the project, and the wider benefits for all concerned.

8
Perceived vs. actual navigation ability: Differences between autistic and typically developing children

McKeown, D. J.; Cruzado, O. S.; Colombo, G.; Angus, D. J.; Schinazi, V. R.

2026-04-13 psychiatry and clinical psychology 10.64898/2026.04.09.26350542 medRxiv
Top 2%
0.1%
Show abstract

PurposeNavigational ability develops throughout childhood alongside the maturation of brain regions supporting egocentric and allocentric processing. In Autism Spectrum Disorder (ASD), atypical hippocampal development may impact flexible spatial memory; however, findings on navigational ability in autistic children remain inconsistent. This study aimed to compare both objective and perceived navigation ability in children with ASD and typically developing (TD) peers. MethodTwenty-six children with high-functioning ASD and twenty-five age- and gender-matched TD children (M_age = 12.04 years, SD = 1.64) completed a battery of navigational tasks from the Spatial Performance Assessment for Cognitive Evaluation (SPACE), including Path Integration, Egocentric Pointing, Mapping, Associative Memory, and Perspective Taking. Perceived navigation ability was assessed using the Santa Barbara Sense of Direction (SBSOD) scale. ResultsNo significant group differences were observed across any objective navigation tasks. However, children with ASD reported significantly lower perceived navigation ability compared to TD peers. ConclusionThese findings suggest a dissociation between perceived and actual navigational ability in ASD. By early adolescence, objective navigation performance appears intact, potentially reflecting sufficient maturation of underlying neural systems or the presence of compensatory mechanisms. The results underscore the importance of incorporating objective, task-based measures when assessing cognitive abilities in autistic populations.

9
Why Invariant Risk Minimization Fails on TabularData: A Gradient Variance Solution

Mboya, G. O.

2026-04-13 epidemiology 10.64898/2026.04.09.26350513 medRxiv
Top 2%
0.1%
Show abstract

Machine learning models trained on observational data from one environment frequently fail when deployed in another, because standard learning algorithms exploit spurious correlations alongside causal ones. Invariant learning methods address this problem by seeking representations that support stable prediction across training environments, but their behavior on tabular data remains poorly characterized. We present CausTab, a gradient variance regularization framework for causal invariant representation learning on mixed tabular data. CausTab penalizes the variance of parameter gradients across training environments, providing a richer invariance signal than the scalar penalty used by Invariant Risk Minimization (IRM). We provide formal results showing that the gradient variance penalty is zero at causally invariant solutions and positive at solutions that rely on spurious features. Through experiments on synthetic data across three spurious-correlation regimes, four cycles of the National Health and Nutrition Examination Survey (NHANES), and four hospital systems in the UCI Heart Disease dataset, we demonstrate that: (1) IRM consistently degrades relative to standard empirical risk minimization (ERM) on tabular data, losing up to 13.8 AUC points in spurious-dominant settings, a failure we trace mechanistically to penalty collapse during training; (2) CausTab matches or exceeds ERM in every experimental condition; (3) CausTab achieves consistently better probability calibration than both ERM and IRM; and (4) invariant learning methods fail when environments differ in outcome prevalence rather than in spurious feature correlations, a boundary condition we characterize both empirically and theoretically. We introduce the Spurious Dominance Index (SDI), a practical scalar diagnostic for determining whether a dataset requires invariant learning, and validate it across all experimental settings

10
HAARF: Healthcare AI Agents Regulatory Framework - A Comprehensive Security Verification Standard for Autonomous AI Systems in Clinical Environments

Schwoebel, J.; Frasch, M.; Spalding, A.; Sewell, E.; Englert, P.; Halpert, B.; Overbay, C.; Semenec, I.; Shor, J.

2026-04-13 health systems and quality improvement 10.64898/2026.04.09.26350519 medRxiv
Top 2%
0.1%
Show abstract

As health systems begin deploying autonomous AI agents that make independent clinical decisions and take direct actions within care workflows, ensuring patient safety and care quality requires governance standards that go beyond existing medical device frameworks designed for human-in-the-loop prediction tools. This paper introduces the Healthcare AI Agents Regulatory Framework (HAARF), a comprehensive verification standard for autonomous AI systems in clinical environments, developed collaboratively with 40+ international experts spanning regulatory authorities, clinical organizations, and AI security specialists. HAARF synthesizes requirements from nine major regulatory frameworks (FDA, EU AI Act, Health Canada, UK MHRA, NIST AI RMF, WHO GI-AI4H, ISO/IEC 42001, OWASP AISVS, IMDRF GMLP) into eight core verification categories comprising 279 specific requirements across three risk-based implementation levels. The framework addresses critical gaps in health system readiness for autonomous AI including: (1) progressive autonomy governance with clinical accountability, (2) tool-use security for agents that independently access EHRs, medical devices, and clinical systems, (3) continuous equity monitoring and bias mitigation across diverse patient populations, and (4) clinical decision traceability preserving human oversight authority. We validate HAARFs enforcement capabilities through a scenario-based red-team evaluation comprising six adversarial scenarios executed under baseline (no middleware) and HAARF- guardrailed conditions (N = 50 trials each, Gemini 2.5 Flash primary with Claude Sonnet 4.6 cross-model validation). In baseline conditions, the agent model executes unauthorized tools in 56-60% of adversarial trials. Under the HAARF condition, deterministic middleware enforcement reduces the unauthorized-tool success rate to 0%, with 0% contraindication misses and 0% policy-injection success (95% Wilson CI [0.00, 0.07]). Cross-model validation confirms identical security metrics, supporting HAARFs model-agnostic design. Mapping analysis demonstrates 48-88% coverage of major regulatory frameworks, with per-category FDA alignment ranging from 73% (C5, Agent Registration) to 91% (C3, Cybersecurity; C7, Bias & Equity). Initial validation with healthcare organizations shows a 40-60% reduction in multi-jurisdictional compliance burden and improved clinical safety governance outcomes. HAARF provides health systems with a practical, risk-stratified pathway for safe AI agent deployment--shifting from reactive compliance to proactive quality governance while maintaining rigorous patient safety standards and human-centered care principles.

11
Clinical and Genetic Evaluation of Suicide Death with and without Interpersonal Trauma Exposure

Monson, E. T.; Shabalin, A. A.; Diblasi, E.; Staley, M. J.; Kaufman, E. A.; Docherty, A. R.; Bakian, A. V.; Coon, H.; Keeshin, B. R.

2026-04-16 psychiatry and clinical psychology 10.64898/2026.04.14.26350901 medRxiv
Top 3%
0.1%
Show abstract

Importance: Suicide is a leading cause of death in the United States with risk strongly influenced by Interpersonal trauma, contributing to treatment resistance and clinical complexity. Objective: To assess clinical and genetic factors in individuals who died from suicide, with and without interpersonal trauma exposure. Design: Individuals who died from suicide with and without trauma were compared in a retrospective case-case design. Prevalence of 19 broad clinical categories was assessed between groups. Results directed selection of 42 clinical subcategories, and 40 polygenic scores (PGS) for further assessment. Multivariable logistic regression models, adjusted for critical covariates and multiple tests, were formulated. Models were also stratified by age group (<26yo and >=26yo), sex, and age/sex. Setting: A population-based evaluation of comorbidity and polygenic scoring in two suicide death subgroups. Participants: A total of 8 738 Utah Suicide Mortality Research Study individuals (23.9% female, average age = 42.6 yo) who died from suicide were evaluated, divided into trauma (N = 1 091) and non-trauma exposed (N = 7 647) individuals. A subset of unrelated European genotyped individuals was also assessed in PGS analyses (Trauma N = 491; Non-trauma N = 3 233). Exposures: Trauma is here defined as interpersonal trauma exposure, including abuse, assault, and neglect from International Classification of Disease coding. Main Outcomes and Measures: Prevalence of comorbid clinical sub/categories and PGS enrichment in trauma versus non-trauma exposed suicide deaths. Results: Overall, trauma-exposed individuals died from suicide earlier (mean age of 38.1 yo versus 43.3 yo; P <0.0001) and were disproportionately female (38% versus 21%, OR = 3.3, CI = 2.9-3.8). Prevalence of asphyxiation and overdose methods, prior suicidality, psychiatric diagnoses, and substance use (OR range = 1.3-3.7) were elevated in trauma exposed individuals who died from suicide. Genetic PGS were also elevated in trauma-exposed individuals who died from suicide for depression, bipolar disorder, cannabis use, PTSD, insomnia, and schizophrenia (OR range = 1.1-1.4) with ADHD and opioid use showing uniquely elevated PGS in trauma exposed males (OR range = 1.2-1.4). Conclusions and Relevance: Results demonstrated multiple convergent lines of age- and sex-specific evidence differentiating trauma-exposed from non-trauma exposed suicide death. Such findings suggest unique biological backgrounds and may refine identification and treatment of this high-risk group.

12
Analytical Choices Impact the Estimation of Rhythmic and Arrhythmic Components of Brain Activity

da Silva Castanheira, J.; Landry, M.; Fleming, S. M.

2026-04-11 neuroscience 10.1101/2025.09.24.678322 medRxiv
Top 3%
0.0%
Show abstract

Brain activity comprises both rhythmic (periodic) and arrhythmic (aperiodic) components. These signal elements vary across healthy aging, and disease, and may make distinct contributions to conscious perception. Despite pioneering techniques to parameterize rhythmic and arrhythmic neural components based on power spectra, the methodology for quantifying rhythmic activity remains in its infancy. Previous work has relied on parametric estimates of rhythmic power extracted from specparam, or estimates of rhythmic power obtained after detrending neural spectra. Variation in analytical choices for isolating brain rhythms from background arrhythmic activity makes interpreting findings across studies difficult. Whether these current approaches can accurately recover the independent contribution of these neural signal elements remains to be established. Here, using simulation and parameter recovery approaches, we show that power estimates obtained from detrended spectra conflate these two neurophysiological components, yielding spurious correlations between spectral model parameters. In contrast, modelled rhythmic power obtained from specparam, which detrends the power spectra and parametrizes brain rhythms, independently recovers the rhythmic and arrhythmic components in simulated neural time series, minimising spurious relationships. We validate these methods using resting-state recordings from a large cohort. Based on our findings, we recommend modelled rhythmic power estimates from specparam for the robust independent quantification of rhythmic and arrhythmic signal components for cognitive neuroscience.

13
Fine-Tuning PubMedBERT for Hierarchical Condition Category Classification

Wang, X.; Hammarlund, N.; Prosperi, M.; Zhu, Y.; Revere, L.

2026-04-15 health systems and quality improvement 10.64898/2026.04.13.26350814 medRxiv
Top 3%
0.0%
Show abstract

Automating Hierarchical Condition Category (HCC) assignment directly from unstructured electronic health record (EHR) notes remains an important but understudied problem in clinical informatics. We present HCC-Coder, an end to end NLP system that maps narrative documentation to 115 Centers for Medicare & Medicaid Services(CMS) HCC codes in a multi-label setting. On the test dataset, HCC-Coder achieves a macro-F1 of 0.779 and a micro-F1 of 0.756, with a macro-sensitivity of 0.819 and macro-specificity of 0.998. By contrast, Generative Pre-trained Transformer (GPT)-4o achieves highest score of a macro-F1 of 0.735 and a micro-F1 of 0.708 under five-shot prompting. The fine-tuned model demonstrates consistent absolute improvements of 4%-5% in F1-scores over GPT-4o. To address severe label imbalance, we incorporate inverse-frequency weighting and per-label threshold calibration. These findings suggest that domain-adapted transformers provide more balanced and reliable performance than prompt-based large language models for hierarchical clinical coding and risk adjustment.

14
Trajectories of physical activity components among community-dwelling older adults.

Hoogerheide, B.; Maas, E.; Visser, M.; Hoekstra, T.; Schaap, L.

2026-04-11 rehabilitation medicine and physical therapy 10.64898/2026.04.10.26350593 medRxiv
Top 3%
0.0%
Show abstract

Background/Objective: Common measures of physical activity (PA) based on duration and intensity do not fully capture its complexity. Adding additional PA components of muscle strength, mechanical strain, and turning actions, can provide a more complete view of activity behavior. Furthermore, PA behaviors differ between men and women. Therefore, the goal of this study is to identify and cluster similar long-term PA patterns over time for each PA component, examined separately for men and women. Methods: We used data from 4963 participants (52% women; mean age 66 years, SD = 8.6) of the Longitudinal Aging Study Amsterdam (1992 to 2019). PA component scores were assigned to self-reported activities, and Sequence Analysis with Optimal Matching was used to identify and cluster similar activity patterns over a period of 10 years, separately for each component and stratified by sex. Results: PA components varied by sex and displayed a unique mix of trajectories, including predominately low, medium, or high activity, increasing or decreasing patterns, and trajectories characterized by early or late mortality. Importantly, trajectories remained independent, indicating that changes in one PA component were not linked to changes in others. Conclusion: Older men and women follow distinct and independent long term PA trajectories across components, underscoring that PA behaviour cannot be described by a single dimension. Significance/Implications: The observed independence and heterogeneity of trajectories suggest that muscle strength, mechanical strain, and turning actions capture meaningful and distinct aspects of PA that are not reflected by traditional measures alone. Future PA-strategies could incorporate these dimensions and acknowledge sex-specific patterns to better reflect natural movement. The independence of components suggests that future interventions should target multiple dimensions, as changes in one component may not translate to others. Such an approach may support more tailored and sustainable PA interventions in later life.

15
A case report on gendered biases in a Finnish healthcare AI assistant

Luisto, R.; Snell, K.; Vartiainen, V.; Sanmark, E.; Äyrämö, S.

2026-04-14 health informatics 10.64898/2026.04.09.26350383 medRxiv
Top 3%
0.0%
Show abstract

In this study, we investigate gender bias in a Retrieval-Augmented Generation (RAG) based AI assistant developed for Finnish wellbeing services counties. We tested the system using 36 clinically relevant queries, each rendered in three gendered variants (male, female, gender-neutral), and evaluated responses using both an LLM-as-a-judge approach and a human expert panel consisting of a physician and a sociologist specializing in ethics. We observed substantial and clinically significant differences across gendered variants, including differential treatment urgency, inappropriate symptom associations, and misidentification of clinical context. Female variants disproportionately framed responses around childcare and reproductive health regardless of clinical relevance, reflecting societal stereotypes rather than medical reasoning. Bias manifested both at the LLM generation stage and the RAG retrieval stage, in several cases causing the model to hallucinate responses entirely. Some bias patterns were persistent across repeated runs, while others appeared inconsistently, highlighting the challenge of distinguishing systematic bias from stochastic variation.

16
Five-Domain Accelerometer-Derived Behavioral Exposome and Incident Cancer Risk in UK Biobank

Ni Chan Chin (Chengqin Ni), M.; Berrio, J. A.

2026-04-12 epidemiology 10.64898/2026.04.07.26350369 medRxiv
Top 4%
0.0%
Show abstract

BackgroundAccelerometer-derived behavioral phenotype captures multidimensional aspects of human behavior extending well beyond physical activity, encompassing light exposure, step counts, physical activity patterns, sleep, and circadian rhythms. Whether these five domains constitute a unified behavioral architecture underlying cancer risk and whether circadian organization and light exposure confer incremental predictive value beyond movement volume alone remains to be comprehensively established. MethodsWe conducted an accelerometer-wide association study (AWAS) encompassing the complete accelerometer-derived behavioral exposome across five behavioral domains in UK Biobank participants with valid wrist accelerometry data. Incident solid cancers were designated as the primary endpoint, with prespecified site-specific solid cancers and hematological malignancy as secondary outcomes. Cox proportional hazards models with age as the timescale were used. The minimal covariate set served as the primary reporting tier, followed by sensitivity analyses additionally adjusting for adiposity/metabolic factors, independent activity patterns, shift work history, and accelerometry measurement quality. Nominal statistical significance was defined as two-sided P < 0.05 ResultsAmong 89,080 participants, 6,598 incident solid cancer events were observed over a median follow-up of 8.39 years. In the minimally adjusted model, the pan-solid-tumor association atlas was dominated by signals from activity volume, inactivity fragmentation, and circadian rhythm. Higher overall acceleration (HR per SD: 0.91, 95% CI: 0.89-0.94) and higher daily step counts (HR: 0.93, 95% CI: 0.90-0.95) were independently associated with reduced solid cancer risk, while inactivity fragmentation metrics were consistently linked to higher risk. Notably, circadian rhythms, most prominently cosinor mesor (Midline Estimating Statistic of Rhythm under cosinor model), emerged as leading inverse risk signals, underscoring the independent contribution of circadian behavioral architecture. Site-specific analyses revealed pronounced heterogeneity across tumor sites. Lung cancer exhibited a robust inverse activity-risk gradient, while breast cancer showed reproducible associations with MVPA. Most strikingly, nocturnal light exposure demonstrated a tumor-site-specific association confined to pancreatic cancer, a signal absent across all other sites examined. Associations for uterine cancer were predominantly inactivity-related and substantially attenuated following adjustment for adiposity and metabolic factors. ConclusionsAcross five accelerometer-derived behavioral domains, solid cancers as a whole were most consistently associated with a high-movement, low-fragmentation, and circadian-coherent behavioral profile. While site-specific heterogeneity exists, the broad cancer risk landscape is dominated by movement volume, inactivity fragmentation, and circadian rhythmicity. Light exposure, although more localized in its contribution, demonstrates a potentially novel and specific association with pancreatic cancer risk. These findings support a five-domain behavioral exposome framework for cancer epidemiology and, importantly, position circadian rhythm integrity and nocturnal light exposure as critically understudied dimensions warranting dedicated mechanistic investigation.

17
GPS Mobility Tracking, Ecological Momentary Assessment, and Qualitative Interviewing to Specify How Space Produces Intersectional Health Inequities: Development and Pilot Testing of the Spatial Intersectionality Health Framework (SIHF) and IGEMA Methodology

Cook, S. H.

2026-04-13 epidemiology 10.64898/2026.04.09.26350546 medRxiv
Top 4%
0.0%
Show abstract

Background. Young sexual and gender minorities of color face compound health risks shaped by interlocking systems of racism, cisgenderism, and class inequality. Spatial health research documents that place shapes health, but existing methods cannot specify the mechanisms through which spatial configurations produce different health outcomes for differently positioned people. This gap prevents targeted intervention. ObjectiveTo develop and pilot test the Spatial Intersectionality Health Framework (SIHF), which specifies three mechanisms through which space produces intersectional health inequities: Layered (multiple oppressive systems activating simultaneously), Positional (the same space producing different health pathways by intersectional position), and Conditional (nominally protective spaces carrying hidden costs for specific positions). We also introduce and validate Intersectional Geographically-Explicit Ecological Momentary Assessment (IGEMA) as the methodology operationalizing SIHF across three data levels. MethodsThe GeoSense study enrolled 32 young sexual and gender minorities of color (ages 18-29) in New York City. IGEMA was implemented across three integrated levels: (1) GPS mobility tracking via participants personal smartphones, linked to census tract structural exposure indices across n=19 participants; (2) ecological momentary assessment of intersectional discrimination with multilevel modeling of mood, stress, and sleep outcomes; and (3) map-guided qualitative interviews with SIHF mechanism coding and intercoder reliability assessment across 92 coded records from 18 participants. This study was conducted as the pilot for NIH R01HL169503. ResultsAll three SIHF mechanisms were empirically detectable. A compound structural gendered racism index outperformed every single-axis alternative in predicting daily mood (b=-0.048, p=.001) and stress (b=0.121, p<.001). The Positional mechanism accounted for 71% of coded harm experiences. Intercoder reliability for mechanism assignment reached kappa=0.824 at Stage 2 reconciliation. Daily intersectional discrimination predicted greater sleep disturbance (b=1.308, p=.004). ConclusionsSIHF and IGEMA together provide an empirically testable framework for specifying how space produces intersectional health inequities. Mechanism specification, not spatial location alone, is the condition for designing research and intervention that reaches the source of harm for multiply marginalized populations.

18
A multidomain intrinsic capacity score tracks longitudinal health trajectories in the UK Biobank

Zhai, T.; Babu, M.; Fuentealba, M.; Al Dajani, S.; Gladyshev, V. N.; Furman, D.; Snyder, M.

2026-04-13 epidemiology 10.64898/2026.04.10.26350621 medRxiv
Top 4%
0.0%
Show abstract

Quantitative measures for tracking functional health have generally been lacking. Intrinsic capacity (IC) has been proposed as an appropriate measure, but its metrics have been derived in small datasets and sparse longitudinal data. Using harmonized measures of cognition, locomotion, sensory function, vitality, and psychological well-being from 501,615 UK Biobank participants and followed for a median of 15.5 years, we derived domain-specific and composite IC scores. We examined associations with incident disease, cause-specific mortality, multimorbidity, lifestyle and socioeconomic factors, and multi-omic profiles from Olink proteomics, NMR metabolomics, clinical biochemistry, and blood-cell traits. We found that composite IC declined non-linearly with age, and within-person decline was steeper than the cross-sectional age measures. Participants with greater baseline morbidity, those who subsequently developed incident disease, and those who died earlier in follow-up showed lower IC trajectories across adulthood. The IC domains were only modestly correlated with one another, supporting multidimensionality, yet higher overall IC was associated with lower risk of most diseases examined. The dominant IC domain varied by endpoint, with cognition informative for dementia, sensory function for hearing loss, psychological capacity for depression, locomotion for osteoarthritis, and vitality for cardiometabolic outcomes. IC was also associated cross-sectionally with physical activity, insomnia, smoking, medication burden, and socioeconomic disadvantage. More proteins were found predictive for vitality, and enrichment converged on immune/inflammatory and metabolic pathways. Blood-based surrogates recapitulated part of the phenotypic signal, particularly for vitality. Overall, this IC framework captures longitudinal health trajectories and broad disease vulnerability in a large middle- to older-aged cohort and supports IC as a clinically meaningful, multidomain phenotype of aging and identifies blood-based correlates that may facilitate at-scale future monitoring of aging-related function declines.

19
Non-genetic component of height as a surrogate marker for childhood socioeconomic position and its association with cardiovascular and brain health: results from HCHS/SOL

Moon, J.-Y.; Filigrana, P.; Gallo, L. C.; Perreira, K. M.; Cai, J.; Daviglus, M.; Fernandez-Rhodes, L. E.; Garcia-Bedoya, O.; Qi, Q.; Thyagarajan, B.; Tarraf, W.; Wang, T.; Kaplan, R.; Isasi, C. R.

2026-04-13 epidemiology 10.64898/2026.04.08.26350438 medRxiv
Top 4%
0.0%
Show abstract

Childhood socioeconomic position (SEP) can have lifelong effects on health. Many studies have used adult height as a surrogate marker for early-life conditions. In this study, we derived the non-genetic component of height, calculated as the residual from sex-specific standardized height regressed on genetically predicted height, as a surrogate for childhood SEP, using data from the Hispanic Community Healthy Study/Study of Latinos (2008-2011). A positive residual would indicate favorable early-life conditions promoting growth, while a negative residual indicates early-life adversity that may stunt the development. The height residual was associated with early-life variables such as parental education, year of birth, US nativity and age at first migration to the US (50 states/DC), supporting the validity of height residual as a surrogate for early-life conditions. Furthermore, a height residual was positively associated with better cardiovascular health (CVH) and cognitive function among middle-aged and older adults. Interestingly, among <35 years old, the height residual was negatively associated with the "Lifes Essential 8" clinical CVH scores. These results suggest the non-genetic component of height as a surrogate for childhood environment, with predictive value for CVH and cognitive function.

20
Wearable-derived physiological features for trans-diagnostic disease comparison and classification in the All of Us longitudinal real-world dataset

Huang, X.; Hsieh, C.; Nguyen, Q.; Renteria, M. E.; Gharahkhani, P.

2026-04-13 epidemiology 10.64898/2026.04.07.26350352 medRxiv
Top 4%
0.0%
Show abstract

Wearable-derived physiological features have been associated with disease risk, but most current studies focus on single conditions, limiting understanding of cross-disease patterns. This study adopts a trans-diagnostic approach to examine whether wearable data capture shared and condition-specific physiological signatures across multiple chronic conditions spanning physical and mental health, and then evaluates the utility of these features for disease classification. A total of 9,301 patients with at least 21 days of consecutive FitBit data from the All of Us Controlled Tier Dataset version 8 were analyzed. Disease subcohorts included cardiovascular disease (CVD), diabetes, obstructive sleep apnea (OSA), major depressive disorder (MDD), anxiety, bipolar disorder, and attention-deficit/ hyperactivity disorder (ADHD), chosen based on prevalence and relevance. Logistic regression and XGBoost models were fitted for each disease subcohort versus the control cohort. We found that compared to using just baseline demographic and lifestyle features, incorporating wearable-derived features enabled improved classification performance in all subcohorts for both models, except for ADHD where improvement was mainly observed for ROC-AUC in logistic regression model likely due to the smaller sample size in ADHD subcohort. The largest performance gains were observed in MDD (increase in ROC-AUC of 0.077 for Logistic regression, 0.071 for XGBoost; p < 0.001) and anxiety (increase in ROC-AUC of 0.077 for logistic regression, 0.108 for XGBoost; p < 0.001). This study provides one of the first comprehensive transdiagnostic evaluations of wearable-derived features for disease classification, highlighting their potential to enhance risk stratification in the real-world setting as a practical complement to clinical assessments and providing a foundation to explore more fine-grained wearable data. Author summaryWearable devices such as fitness trackers and smartwatches are becoming increasingly popular and affordable, providing continuous measurements of heart rate, physical activity, and sleep. Alongside the growing digitization of health records, this creates new opportunities for large-scale, real-world health studies. In this study, we analyzed wearable-derived physiological patterns across a range of chronic conditions spanning both physical and mental health to better understand how these signals relate to disease risk. We found that incorporating wearable-derived heart rate, activity and sleep features improved disease risk classification across several conditions, with particularly strong gains for major depressive disorder and anxiety. By examining how individual features contributed to model predictions, we also identified meaningful associations between physiological signals and disease risk. For example, both duration and day-to-day variation of deep and rapid eye movement (REM) sleep were associated with increased risk in certain conditions. Our study supports the development of real-time, automated tools to assess disease risk alongside clinical care.